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SUBJECTIVE  BAYESIAN  METHODS  FOR  RULE-BASED 
INFERENCE  SYSTEMS 

R.  0.  Duda,  P.  E.  Hart,  and  N.  J.  Nilsson 

ABSTRACT 

The  general  problem  of  drawing  inferences  from  uncertain  or  incom¬ 
plete  evidence  has  invited  a  variety  of  technical  approaches,  some  math¬ 
ematically  rigorous  and  some  largely  informal  and  intuitive.  Most  cur¬ 
rent  inference  systems  in  artificial  intelligence  have  emphasized 
intuitive  methods,  because  the  absence  of  adequate  statistical  samples 
forces  a  reliance  on  the  subjective  judgment  of  human  experts.  We  de¬ 
scribe  in  this  paper  a  subjective  Bayesian  inference  method  that  realizes 
some  of  the  advantages  of  both  formal  and  informal  approaches.  Of  par¬ 
ticular  interest  are  the  modifications  needed  to  deal  with  the  inconsis¬ 
tencies  usually  found  in  collections  of  subjective  statements. 


Index  Terms 

Inference,  Bayes  rule,  artificial  intelligence,  production  systems,  rule- 
based  systems,  subjective  probability 
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INTRODUCTION 


One  of  the  characteristics  of  human  reasoning  is  the  ability  to 
form  useful  judgments  from  uncertain  and  incomplete  evidence.  This 
ability  is  not  only  needed  for  everyday  activities,  which  people  would 
normally  never  formalize,  but  also  for  tasks  such  as  medical  diagnosis 
or  securities  analysis,  which  have  been  subjected  to  formal  treatment. 

Because  the  general  need  to  form  judgments  from  incomplete  data  is 
so  widespread,  many  techniques  have  been  developed  to  aid  or  supplant 
people  in  this  task.  Probability  theory  and  statistics  provide  a  power¬ 
ful  framework  for  dealing  with  many  inference  problems  [1,2].  In  stan¬ 
dard  approaches,  the  link  between  alternative  hypotheses  and  relevant 
evidence  is  represented  by  conditional  or  joint  probabilities  that  are 
estimated  from  statistical  samples.  If  the  number  of  alternative  hypoth¬ 
eses  and  the  amount  of  relevant  evidence  are  not  too  great,  and  if  the 
available  sample  is  sufficiently  large,  then  probability  and  statistics 
furnish  the  preferred  analytical  tools.  However,  when  many  kinds  of  evi¬ 
dence  simultaneously  bear  on  an  hypothesis,  traditional  statistical  ap¬ 
proaches  become  inappropriate  because  estimation  problems  become  unman¬ 
ageable  . 

Recent  work  in  artificial  intelligence  has  suggested  other  approache 
to  the  problem  of  resolving  hypotheses  on  the  basis  of  a  mass  of  uncer¬ 
tain  evidence.  Among  the  most  attractive  are  rule-based  systems,  which 
use  a  large  body  of  inference  rules,  supplied  by  experts,  to  provide  the 
knowledge  needed  to  distinguish  among  competing  hypotheses  [3-6].  Each 
inference  rule  defines  the  role  of  a  particular  set  of  evidence  in 
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resolving  a  particular  hypothesis.  Typically,  an  ad  hoc  scoring  func¬ 
tion  is  used  to  combine  the  effects  of  collections  of  uncertain  evidence 
acting  through  several  inference  rules  on  the  same  hypothesis.  Thus, 
rule-based  systems  attempt  to  substitute  judgments  distilled  from  long 
experience  for  joint  probabilities  estimated  from  prohibitively  large 
samples. 

Our  purpose  in  this  paper  is  to  describe  a  subjective  Bayesian  tech¬ 
nique  that  can  be  used  in  place  of  ad  hoc  scoring  functions  in  rule- 
based  inference  systems.  Our  intent  is  to  retain  insofar  as  possible 
the  well-understood  methods  of  probability  theory,  introducing  only 
those  modifications  needed  because  we  are  dealing  with  networks  of  sub¬ 
jective  inference  rules.  The  scope  of  the  paper  is  limited;  we  shall 
not  discuss  here  the  more  general  issues  of  representation  and  control 
that  must  be  faced  when  designing  a  complete  rule-based  inference  system. 

II  FUNDAMENTALS 

In  a  rule-based  inference  system,  the  rules  are  typically  of  the 

form 

If  and  E^  and  . . .  and  E 
then  H 

where  Ej|^(i  =  l...n)  is  the  i  piece  of  evidence  and  H  is  an  hypoth¬ 
esis  suggested  by  the  evidence.  Each  inference  rule  has  a  certain 
strength  measured  by  parameters  that  will  be  defined  later.  For  now  it 
suffices  to  say  that  the  greater  the  strength,  the  greater  is  the  power 
of  the  evidence  to  confirm  the  hypothesis.  In  most  applications,  the 
rules  and  their  strengths  are  provided  by  carefully  interviewing  experts. 
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The  individual  pieces  of  evidence  (the  E^)  and  the  hypothesis  (H) 
of  a  rule  are  propositional  statements.  Instead  of  being  either*  aUsb-  ' 
lutely  true  or  false,  the  truth  values  of  these  propositional  statements 
may  be  uncertain.  In  this  paper  we  shall  represent  these  uncertainties  by 
probabilities^  so  that  associated  with  each  propositional  statement  is  a 
corresponding  probability  value. 

To  simplify  matters,  we  shall  assume  (without  loss  of  generality) 
that  each  rule  has  only  a  single  propositional  statement  as  evidence  on 
its  left-hand  side.  To  reduce  a  conjunction  to  a  single  statement,  we 
need  a  method  for  computing  the  joint  probability,  P(Ej|^,  . .  .,Ej^)  from  the 
individual  probabilities  P(EjL).  Two  simple  alternatives  are  to  assume 
independence  of  the  E^  or  to  use  the  fuzzy  set  computation  P(E^,  . .  .,Ej^)  - 
min  P(E£).  More  generally,  the  left-hand  side  of  a  rule  could  contain 
an  arbitrary  logical  expression,  E.  The  results  of  this  paper  do  not  de¬ 
pend  on  how  the  probability  of  E  is  computed. 

We  represent  a  rule  of  the  form  E  then  H"  graphically  by  the 
following  structure: 

0 - ^0 

Here  a  propositional  statement  is  being  represented  as  a  node,  and  an  in¬ 
ference  rule  is  being  represented  as  an  arc.  A  collection  of  rules  about 
some  specific  subject  area  invariably  uses  the  same  pieces  of  evidence 
to  imply  several  different  hypotheses.  It  also  frequently  happens  that 
several  alternative  pieces  of  evidence  imply  the  same  hypothesis.  Fur¬ 
thermore,  there  are  often  chains  of  evidences  and  hypotheses.  For  these 
reasons  it  is  natural  to  represent  a  collection  of  rules  as  a  graph 
structure  or  inference  net. 
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An  example  of  an  inference  net  is  shown  in  Figure  1.  The  H;i_  at 
the  top  of  the  net  are  alternative  hypotheses  to  be  resolved.  Each  arc 
entering  a  node  represents  an  inference  rule  and  has  associated  with  it 
a  strength.  Notice  that  a  typical  intermediate  node  like  can  play 
two  roles:  it  provides  supporting  evidence  for  the  nodes  above  it  (E2 
and  E^),  and  it  acts  as  an  hypothesis  to  be  resolved  by  evidence  below 
it  (Eg  and  Eg) . 

The  main  problem  to  be  considered  in  this  paper  concerns  the  propa¬ 
gation  of  probabilities  through  the  net.  Suppose  for  example,  that  a 
user  of  the  net  provides  evidence  by  deciding  that  the  probability  of  a 
node,  say  Eg,  should  be  changed  from  its  prior  value  to  some  new  value. 
Obviously  this  should  require  updating  of  the  probabilities  of  E^  and, 
in  turn,  Ej^,  E2,  and  and  so  on.  Any  mechanism  used  for  propagating 
probabilities  must  be  able  to  cope  with  a  number  of  problems.  The  rules 


FIGURE  1  A  SIMPLE  INFERENCE  NET 
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have  uncertainty  associated  with  them^  and  the  evidence  provided  by  a 
user  may  be  uncertain.  These  two  different  kinds  of  uncertainty  must 
somehow  be  combined.  Multiple  evidence  typically  bears  on  a  single  hy¬ 
pothesis,  so  that  some  form  of  independence  must  usually  be  assumed. 
Finally,  the  rules  are  provided  subjectively  by  experts,  so  certain 
kinds  of  inconsistencies  arise  that  can  seriously  jeopardize  success. 

In  the  following  sections  we  suggest  a  Bayesian  updating  scheme  that  ad¬ 
dresses  these  concerns. 


Ill  SUBJECTIVE  BAYESIAN  UPDATING 

Suppose  we  are  given  a  rule  E,  then  H.  Let  us  begin  with  the 
simplified  problem  of  updating  the  probability  of  H  given  its  prior 
value  and  given  that  E  is  observed  to  be  true.  By  Bayes  rule,  we  have 

^  P(;e|H)P(H) 

P(E) 

For  our  purposes,  a  more  convenient  form  of  Bayes  rule  is  arrived  at  by 
writing  the  complementary  form  for  the  negation  of  H 


PCHjE) 


P(E) 


(2) 


and  dividing  Eq.  (1)  by  Eq.  (2)  to  obtain 


P(H|E)  ^  P(E|H)  P(H) 
P(ir  E)  P(E|H)  P(H) 


Each  of  the  three  terms  in  this  equation  has  a  traditional  interpreta¬ 
tion.  We  define  the  prior  odds  on  H  to  be 


0(H) 


P(H) 


P(H) 

1  -  P(H) 
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and  the  posterior  odds  to  be 


0(H|E) 


P(H 

\n. 

P(H 

E) 

|E)  1  -  P(H[E) 

Now  the  likelihood  ratio  is  defined  by 


_  P  (E  |H). 
^  P(E|H) 


(5) 


(6) 


so  Eq.  3  becomes  the  odds-likelihood  formulation  of  Bayes  rule; 

0(H|E)  =  XO(H)  .  (7) 


This  equation  tells  us  how  to  update  the  odds  on  H  given  the  observation 
of  E.  For  rule-based  inference  systems^  we  assume  that  a  human  expert 
has  given  the  rule  and  has  provided  the  likelihood  ratio  \  to  indicate 
the  "strength"  of  the  rule.  A  high  value  of  X  (X  »  1)  represents, 
roughly  speaking,  the  fact  that  E  is  sufficient  for  H,  since  the  obser¬ 
vation  that  E  is  true  will  transform  indifferent  prior  odds  on  H  into 
heavy  posterior  odds  in  favor  of  H.  Notice,  incidentally,  that  the  un¬ 
derlying  probabilities  can  be  recovered  from  their  odds  by  the  simple 
formula 


P 


0 

0+1 


(8) 


so  that  the  odds  and  the  probabilities  give  exactly  the  same  information. 

Suppose  now  that  we  wish  to  update  the  odds  on  H  given  that  E  is 
observed  to  be  false.  In  a  strictly  analogous  fashion,  we  write 


where  we  define  X  by 


1  »•  PCEIH’) 
1  -  P(E  H) 


(10) 


Notice  that  X  must  also  be  provided  by  the  human  expert;  it  cannot  be 
derived  from  X.  A  low  value  of  X,  (0  <  X  «  1)  represents,  roughly 
speaking,  the  fact  that  E  is  necessary  for  H,  since  the  observation  that 
E  is  false  will  by  Eq.  9  transform  indifferent  prior  odds  on  H  into  odds 
heavily  against  H.  Curiously,  although  X  and  X  must  be  separately  pro¬ 
vided  by  the  expert,  they  are  not  completely  independent  of  each  other. 
In  particular,  Eqs.  (6)  and  (9)  yield 


1  -  XP(E|H) 
1  -  P(EjH) 


so  that,  if  we  exclude  the  extreme  cases  of  P(E |H)  being  either  0  or  1, 
we  see  that  X  1  implies  X  <  1^  and  X  <  1  implies  X  >  1.  Further,  xtfe 
have  X  =  1  if  and  only  if  X  =  !•  This  means  that  if  the  expert  gives 
a  rule  such  that  the  presence  of  E  enhances  the  odds  on  H  (i.e.,  X  >  1), 
he  should  also  tell  us  that  the  absence  of  E  depresses  the  odds  on  H 
(i.e.,  X  <  1) .  To  some  extent,  this  mathematical  requirement  does  vio¬ 
lence  to  intuition.  People  who  work  with  rule-based  inference  systems 
are  commonly  told  by  experts  that  "The  presence  of  E  enhances  the  odds 
on  H,  but  the  absence  of  E  has  no  significance."  In  other  words,  the 
expert  says  that  X  >  1,  but  \  1.  Subsequently,  we  shall  suggest  some 

modifications  that  address  this  and  other  problems  of  inconsistency. 

We  note  in  passing  that  knowledge  of  both  X  and  X  is  equivalent  to 
knox^ledge  of  both  P(E|H)  and  P(E  |H)  .  Indeed,  it  follows  at  once  from 
Eqs.  (6)  and  (10)  that 

P(E|H)  =  X  (12) 

'  X  -  X 
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and 


P(EjH)  =  (13) 

Thus,  whether  the  expert  should  be  asked  to  provide  X  and  X^  ?(E |H)  and 
P(EjH),  or,  indeed,  some  other  equivalent  information  is  a  psychological 
rather  than  a  mathematical  question  [7]. 

IV  UNCERTAIN  EVIDENCE  AND  THE  PROBLEM  OF  PRIOR  PROBABILITIES 

Having  seen  how  to  update  the  probability  of  an  hypothesis  when  the 
evidence  is  known  to  be  either  certainly  true  or  certainly  false,  let  us 
consider  now  how  updating  should  proceed  when  the  user  of  the  system  is 
uncertain.  We  begin  by  assuming  that  when  a  user  says  ”I  am  707<,  certain 
that  E  is  true,"  he  means  that  P(E|  relevant  observations)  =  .7.  We  des¬ 
ignate  by  E '  the  relevant  observations  that  he  makes,  and  simply  write 
P(E|E')  for  the  user’s  response. 

We  now  need  to  obtain  an  expression  for  P(H|E')»  Formally, 

P(HjE')  =  P(H,EjE')  +  P(H,E|E') 

=  P(h|e,EOI’(E[E')  +  P(H|E,E0P(E|E')  .  (14) 

We  make  the  reasonable  assumption  that  if  we  know  E  to  be  true  (or  false), 
then  the  observations  E'  relevant  to  E  provide  no  further  information 
about  H.  With  this  assumption,  Eq.  (14)  becomes 

P(HjE')  =  P(HlE)P(E|EO  +  P(H[E)P(E(E')  .  (15) 

Here  P(H|E)  and  P(HjE)  are  obtained  directly  from  Bayes  rule,  i.e.,  from 
Eq.  (7)  and  Eq.  (9),  respectively. 
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If  the  user  is  certain  that  E  is  true^  then  P(HjE')  =  P(HjE).  If 
the  user  is  certain  that  E  is  false,  then  P(H|E')  =  P(HjE).  In  general, 
Eq.  (15)  gives  P(h!e')  as  a  linear  interpolation  between  these  two  ex¬ 
treme  cases.  In  particular,  note  that  if  P(EjE')  ~  P(E)  then 
P(HjE')  =  P(H) .  This  has  the  simple  interpretation  that  if  the  evidence 
E'  is  no  better  than  a  priori  knowledge,  then  application  of  the  rule 
leaves  the  probability  of  H  unchanged. 

In  a  pure  Bayesian  formulation,  Eq.  (15)  is  the  solution  to  the  up¬ 
dating  question.  In  practice,  however,  there  are  significant  difficul¬ 
ties  in  using  this  formulation  in  an  inference  net.  These  difficulties 
stem  from  a  combination  of  the  classical  Bayesian  dilemma  over  prior 
probabilities  and  the  use  of  subjective  probabilities. 

To  appreciate  the  difficulty,  consider  again  a  typical  pair  of 
nodes  E  and  H  embedded  in  an  inference  net.  It  is  apparent  from  Eqs. 

(7)  and  (9)  that  the  updating  procedure  depends  on  the  availability  of 
the  prior  odds  0(H).  Thus,  although  we  have  not  emphasized  the  point 
until  now,  we  see  that  the  expert  must  be  depended  upon  to  provide  the 
prior  odds  as  well  as  \  and  X  when  the  inference  rule  is  given.  On  the 
other  hand,  recall  our  earlier  observation  that  E  also  acts  as  an  hypoth¬ 
esis  to  be  resolved  by  the  nodes  below  it  in  the  net.  Thus,  the  expert 
must  also  provide  prior  odds  on  E.  If  all  of  these  quantities  were  spec¬ 
ified  consistently,  then  the  situation  would  be  as  represented  in  Figure 
2.  The  straight  line  plotted  is  simply  Eq.  (15),  and  shows  the  interpo¬ 
lation  noted  above.  In  particular,  note  that  if  the  user  asserts  that 
P(E[E'')  =  P(E),  then  the  updated  probability  is  P(H|E')  =  P(H).  In 
Other  words,  if  the  user  provides  no  new  evidence,  then  the  probability 
of  H  remains  unchanged. 

In  the  practical  case,  unfortunately,  the  subjectively  obtained 
prior  probabilities  are  virtually  certain  to  be  inconsistent,  and  the 
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P(HiE') 
(UPDATED 
PROBABILITY 
OF  H) 


0  P{E)  1 

P(ElE') 

(CURRENT  PROBABILITY  OF  E) 

SA-4763-2 

FIGURE  2  IDEALIZED  UPDATING  OF  P(HlE') 

situation  becomes  as  shown  in  Figure  3.  Note  that  P(E),  the  prior  prob¬ 
ability  provided  by  the  expert,  is  different  from  P^(E),  the  probability 
consistent  with  P(H).  Here,  if  the  user  provides  no  new  evidence~-i.e., 
if  P(E|E0  =  P(E)--then  the  formal  Bayesian  updating  scheme  will  substan¬ 
tially  change  the  probability  of  H  from  its  prior  value  P(H).  Further¬ 
more,  for  the  case  shown  in  Figure  3,  if  the  user  asserts  that  E  is  true 
with  a  probability  P(EjE')  lying  in  the  interval  between  P(E)  and  Pp(E), 
then  the  updated  probability  P(HjE')  will  be  less  than  P(H).  Thus,  we 
have  here  an  example  of  a  rule  intended  to  increase  the  probability  of 
H  if  E  is  found  to  be  true,  but  which  turns  out  to  have  the  opposite  ef¬ 
fect.  This  type  of  error  can  be  compounded  as  probabilities  are  propa¬ 
gated  through  the  net. 

Several  measures  can  be  taken  to  correct  the  unfortunate  effects  of 
priors  that  are  inconsistent  with  inference  rules.  Since  the  problem 


P(HiE)  - 


P(H) - 


P(H!E} 
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P(HlE') 
(UPDATED 
PROBABILITY 
OF  H) 


P{EiE') 

(CURRENT  PROBABILITY  OF  E) 

SA-4763-3 


FIGURE  3  INCONSISTENT  PRIORS 


can  be  thought  of  as  one  of  overspecification,  one  approach  would  be  to 
relax  the  specification  of  whatever  quantities  are  subjectively  least 
certain.  For  example,  if  the  subjective  specification  of  P(E)  were 
least  certain  (in  the  expert’s  opinion),  then  we  might  set  P(E)  -  P^(E). 
This  approach  leads  to  difficulties  because  the  pair  of  nodes  E  and  H 
under  consideration  are  embedded  in  a  large  net.  For  example,  in  Figure 
1,  we  might  be  considering  node  E^  as  the  hypothesis  H,  and  node  E^  as 
the  evidence  E.  If  we  were  to  establish  a  prior  probability  P(Eg)  to  be 
consistent  with  P(E2),  we  would  simultaneously  make  P(E^)  inconsistent 
with  the  priors  ori  Eg  and  Eg,  which  provide  supporting  evidence  for  E^. 
Prior  probabilities  can  therefore  not  be  forced  into  consistency  on  the 
basis  of  the  local  structure  of  the  inference  net;  apparently,  a  more 
global  process--perhaps  a  relaxation  process --would  be  required. 
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A  second  alternative  for  achieving  consistency  would  be  to  adjust 
the  linear  interpolation  function  shown  in  Figure  3.  There  are  several 
possibilities,  one  of  which  is  illustrated  in  Figure  4a.  The  linear 
function  has  been  broken  into  a  piecewise  linear  function  at  the  coordi¬ 
nates  of  the  prior  probabilities,  forcing  consistent  updating  of  the 
probability  of  H  given  E'.  Two  other  possibilities  are  shown  in  Figures 
4b  and  4c.  In  Figure  4b  we  have  introduced  a  dead  zone  over  the  inter¬ 
val  between  the  specified  prior  probability  P(E)  and  the  consistent 
prior  P^(E).  Intuitively,  the  argument  in  support  of  this  consistent  in¬ 
terpolation  function  is  that  if  the  user  cannot  give  a  response  outside 
this  interval,  then  he  is  not  sufficiently  certain  of  his  response  to 
x«7arrant  any  change  in  the  probability  of  H.  Figure  4c  shows  another  pos¬ 
sibility,  motivated  by  the  earlier  observation  that  experts  often  give 


FIGURE  4  CONSISTENT  INTERPOLATION  FUNCTIONS 
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P(HIE) 


0  P(E)  P^(E) 


1 


P(EiE') 


(b) 

SA-4763-5 

FIGURE  4  CONSISTENT  INTERPOLATION  FUNCTIONS  (Continued) 


rules  of  the  form  "The  presence  of  E  enhances  the  odds  on  H,  but  the  ab¬ 
sence  of  E  has  no  significance."  By  keeping  P(H|E')  equal  to  P(H)  when 
P(E|E')  is  less  than  P(E)  we  are  effectively  allowing  the  forbidden  situ 
ation  where  X  >  1  and  X  -  !•  In  effect^  this  is  equivalent  to  the 
method  illustrated  in  Figure  4a  under  the  assumption  that  P(HjE^  =  P(H). 

It  is  interesting  to  compare  these  modifications  with  the  procedure 
used  by  Shortliffe  to  handle  uncertain  evidence  in  the  MYCIN  system 
[4,5].  While  the  nonlinear  equations  that  result  from  use  of  Shortliffe 
version  of  confirmation  theory  prevent  a  general  comparison,  it  is  possi 
hl('  to  express  bis  procedure  in  pur  terms  for  the  special  case  of  h  sin¬ 
gle  rule.  The  result  for  the  case  in  which  the  presence  of  E  supports  H 
is  shown  in  Figure  5.  Ql^arly,  the  solution  is  identical  to  that  of 


PIHlE') 


P(HIE) 


P(H) 


P(HiE) 


0 


P(E) 


P(ElE') 


(c) 

SA-4763-6 

FIGURE  4  CONSISTENT  INTERPOLATION  FUNCTIONS  (Concluded) 

Figure  4c  except  for  the  interval  from  P(E)  to  within  which  Short- 

liffe's  solution  maintains  P(H|E')  at  the  a  priori  value  P(H). 


The  graphical  representations  in  Figures  2  through  4  provide  a  nice 
vehicle  for  visualizing  the  discrepancies  between  formal  and  subjective 
Bayesian  updating,  and  make  it  easy  to  invent  other  alternatives  for  rec¬ 
onciling  inconsistencies.  For  completeness,  the  Appendix  contains  the 
easily  computable  algebraic  representations  of  these  functions,  and  also 
treats  the  complementary  case  in  which  the  straight  line  given  by  Eq.  (15) 
has  a  negative  slope  (the  case  in  which  \  <  X) .  In  a  small  experimental 
system,  the  function  shown  in  Figure  4a  has  given  satisfactory  prelimi¬ 
nary  results  [8]. 
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P(EiE') 
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FIGURE  5  THE  INTERPOLATION  FUNCTION  USED  IN  THE  MYCIN  SYSTEM 
P^(E)  =  P(E)  +  t[1  -  P(E)].  Typically,  t  =  0.2. 

V  THE  USE  OF  MULTIPLE  EVIDENCE 

We  turn  now  to  the  more  general  updating  problem  in  which  several 
rules  of  the  form  Ej^  H  all  concern  the  same  hypothesis  H. 

Since  most  nodes  in  actual  inference  nets  have  several  incoming  arcs, 
this  is  the  case  of  greatest  practical  interest.  In  order  to  gain  some 
insight  about  how  multiple  evidence  should  be  used  to  update  H  when  the 
evidence  is  uncertain  and  the  priors  are  inconsistent,  let  us  first  con¬ 
sider  briefly  how  updating  would  formally  proceed  in  simpler  cases. 

Suppose  the  i^^  inference  rule  has  associated  with  it  the  usual  two 
quantities  and  X^*  For  a  first  simple  case,  how  should  H  be  updated 


* 

This  should  not  be  confused  with  the  conjunctive  premise  mentioned 
earlier. 
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when  all  the  have  been  observed  to  be  certainly  true?  This  case  is 
analogous  to  the  case  summarized  by  Eq.  (7).  Under  the  assumption  that 
the  pieces  of  evidence  are  conditionally  independent  (i.e.,  that 
P(Ep.,.,E^|H)  =[^P(Ei|H)  and  that  P(Ep...,E^p  =  f[^P(Ei|H)),  it  is 
not  difficult  to  reach  an  analogous  answer.  Specifically,  the  odds  on 
H  are  updated  by  the  expression 


where 


0(H|E3^,...,E^) 


0(H) 


(16) 


P(E^|H) 


(17) 


Similarly,  if  all  the  evidence  is  observed  to  be  certainly  false,  we  can 
under  conditional  independence  assumptions  again  factor  the  joint  likeli¬ 
hood  ratio  to  obtain 


0(HjEi, 


0(H) 


(18) 


Now  let  us  consider  the  general  case  of  uncertain  evidence  and  in¬ 
consistent  prior  probabilities.  We  already  know  that  the  posterior  odds 
0(H|Ep  given  a  single  observation  can  be  computed  using  updating 
functions  like  the  ones  shown  in  Figure  4.  We  can  therefore  define,  for 
a  single  inference  rule,  an  effective  likelihood  ratio  by 


^  A 

^1  "  0(H) 


(19) 


By  making  the  assumption  now  that  the  Ef  are  independent,  we  can  obtain 
for  the  general  case  an  expression  similar  to  the  simple  updating  for¬ 
mulas  given  by  Eqs.  (16)  and  (18): 
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0(H) 


0(H|E', 


(16) 


To  use  this  expression  in  an  inference  net  system^  we  simply  store  with 
each  node  its  prior  odds  (or  probability)^  and  store  with  each  incoming 
arc  an  effective  likelihood  ratio  X|.  Whenever  a  piece  of  evidence  pro¬ 
vided  by  the  user  causes  P(E^|Ep  to  be  updated,  a  new  effective  likeli¬ 
hood  ratio  is  computed  and  the  posterior  odds  in  favor  of  H  is  computed 
using  Eq.  (20).  This  procedure  has  the  following  consequences: 


(1)  If  no  evidence  is  obtained  for  a  rule,  then  it  will 

retain  an  initial  effective  likelihood  ratio  of  unity, 
since  prior  and  "posterior"  odds  are  the  same. 


(2)  The  order  in  which  evidence  is  obtained  and  rules  are 
applied  does  not  affect  the  final  posterior  probabili¬ 
ties. 


(3)  The  same  rule  can  be  used  repeatedly,  with  the  same  or 
different  values  for  the  probability  of  the  evidence. 
In  particular,  if  a  user  changes  his  mind  and  modifies 
an  earlier  assertion,  the  new  assertion  will  correctly 
"undo"  any  effects  of  earlier  statements. 


VI  CONCLUSIONS 

The  probability  updating  procedure  presented  here  has  several  points 
to  recommend  it.  It  accepts  subjective  information  that  can  readily  be 
obtained  from  experts.  The  two  conditional  probabilities,  P(E|H)  and 
P(E|H),  that  determine  the  strength  of  an  inference  rule  typically  are 
intuitively  meaningful  measures,  and  the  procedure  is  tolerant  of  the 
inevitable  inconsistencies  in  subjective  expert  information.  The  basis 
in  probability  theory  of  our  procedure  provides  a  useful  theoretical 
foundation  for  calculating  the  effects  of  uncertain  evidence.  One  value 
of  theory  is  that  it  makes  us  explicitly  aware  of  certain  underlying  as¬ 
sumptions  about  such  nuttters  as  conditional  independence,  prior 
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probabilities,  and  inconsistent  information.  Finally,  our  procedure  is 
straightforward  computationally  and  can  be  readily  implemented  in  infer¬ 
ence  net  systems. 

There  are,  however,  some  questions  that  remain  to  be  dealt  with. 

If  the  network  contains  multiple  paths  linking  a  given  piece  of  evidence 
to  the  same  hypothesis,  the  independence  assumption  is  obviously  violated. 
It  is  important  to  settle  on  a  reasonable  (if  ad  hoc)  modification  of 
our  basic  procedure  that  behaves  appropriately  in  such  situations.  (A 
more  extreme  complication  would  involve  being  able  to  avoid  the  circular 
reasoning  implied  by  inference  nets  with  loops.) 

There  are  sometimes  cases  where  some  of  the  nodes  in  an  inference 
net  are  related  by  a  constraint  not  expressed  in  any  given  rule.  For 
example,  a  subset  of  h3q)otheses  may  be  mutually  exclusive  and  exhaustive, 
in  which  case  their  probabilities  must  always  sum  to  one,  regardless  of 
their  individual  values.  Such  a  constraint  may  be  inconsistent  with  the 
associated  rule  strengths  given  us  by  the  experts.  Perhaps  a  simple  ex¬ 
pedient,  such  as  renormalization  of  probability  values,  can  be  justified 
in  this  case. 

We  have  not  addressed  here  at  all  issues  of  inference  net  control 
strategy:  for  example,  which  hypotheses  should  be  pursued  and  which 

evidence  should  be  sought  at  any  step.  The  answers  to  these  sorts  of 
questions  may  be  heavily  dependent  on  the  particular  application. 

Another  global  question  concerns  rules  containing  logical  statements 
that  may  include  quantifiers  and  variables.  But  in  whatever  way  these 
questions  are  answered,  the  basic  updating  procedure  presented  here 
would  appear  to  be  a  useful  component  of  rule-based  inference  systems. 
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APPENDIX 

Complete  analytical  expressions  giving  P(H|E')  as  a  piecewise 
linear  function  of  P(E|E')  are  given  in  this  Appendix.  These  expres¬ 
sions  correspond  to  the  three  graphical  representations  illustrated  in 
Figure  4.  The  simplest  expression  corresponds  to  Figure  4a: 


[P(H|E)+  [P(H)  -  P(H|E)] 


P(H[E') 


P(H)  -  P(H|E)P(E)  p-pir.'  P(H|E)  -  ?(H) 
1  -  P(E)  '  1  -  P(E) 


0  S  P(EjE')  :s  P(E) 


(Al) 


P(E)  ^  P(E|EO  ^  1 


Here  it  is  important  to  note  that  the  four  quantities  P(H),  P(E),  P(H|E), 
and  P(HjE)  are  assumed  to  be  estimates  obtained  from  experts.  Were  the 
true  probabilities  to  be  used  in  this  formula,  it  would  reduce  at  once 
to  the  linear  expression  given  by  Eq.  (15).  The  estimates  of  P(H|E)  and 
P(H|E)  might  be  obtained  directly  from  an  expert,  but  would  more  often 
be  obtained  through  Bayes  rule  [Eqs.  (7)  and  (9),  respectively].  To  be 
explicit, 


PCHjE) 


P(E[H)P(H) 


XP(H) 


[P(E|H)  -  P(EjH)]P(H)  +  P(E|H)  (X  -  l)P(H)  +  1 


(A2) 
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and 


P,H|F^ _ 11  -  r(E|H)lFW - ^  _  _  ll'W  (A3) 

'  I  ^  [P(E|H)  -  P(E|H)1P(H)  +  1  -  P(E|H)  CX  -  1)P<H>  +  1 


To  obtain  the  equations  for  Figure  4b,  we  define  P„(E)  by 


P  (E) 
c 


P(H)  -  P(H|E) 
P(H[E)  -  P(HjE) 


(A4) 


In  general,  this  quantity  will  differ  from  the  P(E)  value  supplied  by 
the  expert.  For  Figure  4b  we  must  distinguish  between  the  two  cases 
P(E)  £  Pj,(E)  and  P(E)  >P^(E).  The  equations  are  as  follows: 


Case  1:  P(E)  ^  Pc(E) 

P(H|E)  +  CP(H)  -  P(HjE)] 

P(H[EO  =  *  P(H) 

P(H|E)  +  P(E|EO  [P(H|E)  -  P(H|E)] 


0  S  P(E |E')  £  P(E) 
P(E)  ^  P(E|E')  S 
P^(E)  S  P(E|E0  :£  1 


Case  2:  P(E)  >  P^CE) 


'P(H|E)+  P(E|e0  [P(H|E)  -  P(H|E^] 
P(H|E0  =  Jp(H) 


P(H)  -  P(H|E)P_CEl  P(H|E),  -  P(.H.). 


0  ^  P(E|EO  ^  P^(E) 

P^(E)  s  P(E|E0  ^  P(E)(a6) 

P(E)  ^  P(E|E')  5  1 


Finally,  there  are  also  txvo  cases  to  be  distinguished  for  Figure  4c. 
The  first  case  corresponds  to  assuming  that  P(H|E)  «  P(H),  so  that 
Pj,(E)  «  0.  The  second  case  corresponds  to  assuming  that  P(H[E)  se  P(H), 
so  that  Pj,(E)  «  1.  In  effect,  these  cases  correspond  to  the  rules 
E  4  H  and  E  ^  H  taken  separately.  The  corresponding  equations  are 
special  cases  of  Eqs.  (A5)  and  (A6) : 
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Case  1 : 

E 

iH 

P(H) 

0  ^  P(E|E0  P(E) 

P(HjE') 

=  ■ 

P(H).  - 
1 

P(H|E)P(E) 
-  P(E) 

■  +  P(E 

P(H|E)  -  P(H) 

^  1  -  P(E) 

P(E)  ^  P(E|EO  ^  1 

(A7) 

Case  2: 

E 

5h 

P(HjE') 

—  . 

P(H|E) 

P.(E., ].£/,). 
P(E) 

[P(H)  . 

-  P(H|E)] 

0  S  P(E|E')  ^  P(E) 

(A8) 

P(H) 

P(E)  P(E|EO  ^  1 

Ordinarily  one  would  view  this  as  a  simplified  approximation  that 
is  useful  when  one  of  the  two  likelihood  ratios  is  dominant.  However, 
it  is  interesting  to  observe  that  if  both  \  and  \  are  significant  and  if 
the  two  separate  rules  E  ^  H  and  E  H  are  treated  as  if  E  and  E  were 
statistically  independent,  then  Eqs.  (A7)  and  (A8)  yield  the  same  result 
as  Eq.  (Al) .  This  follows  from  the  fact  that  when  P(HjE')  =  P(H)  we 
have  0(H|E0  =  0(H),  so  that  Eq.  (19)  yields  =  1.  Thus,  if 
0  ^  P(E|E')  ^  P(E)  only  the  rule  E  ^  H  contributes  to  P(HjE'),  while  if 
P(E)  ^  P(EjE')  ^  1  only  the  rule  E  ^H  contributes  to  P(HjE')^  the  con¬ 
tributions  being  exactly  those  given  in  Eq-  (Al) . 
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